We Claim: 



1. A method for run-time prediction of a next caller of a shared functional 
unit, wherein the shared functional unit is operable to be called by two or more callers out 
of a plurality of callers, the method comprising: 

detecting a calling pattern of the plurality of callers of the shared functional unit; 
predicting the next caller out of the plurality of callers of the shared functional 
unit; and 

loading state information associated with the next caller out of the plurality of 

callers; 

wherein the shared functional unit and the plurality of callers are operable to 
execute in parallel on a parallel execution unit. 

2. The method of claim 1, wherein the run-time prediction is performed for 
an application described by a dataflow graph. 

3. The method of claim 1, wherein the run-time prediction is performed for 
an application programmed in a dataflow language. 

4. The method of claim 1, further comprising: 
storing a caller history of the shared functional unit; 
wherein said detecting comprises: 

dividing the caller history into a first portion of the caller history and a 
second portion of the caller history, wherein the first portion and the second portion hold 
substantially the same portion of the caller history; and 

comparing the callers of the first portion of the caller history to the callers 
of the second portion of the caller history. 

5. The method of claim 4, 

wherein said storing the caller history uses a history register, wherein the history 
register is operable to be divided into two substantially equal parts. 
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6. The method of claim 5, 

wherein said comparing operates to compare callers in the first part of the history 
register to the callers in the second part of the history register. 

7. The method of claim 5, 

wherein each of the plurality of callers has a unique identification, wherein the 
unique identification is operable to be used in the caller history. 

8. The method of claim 7, 

wherein the history register is operable to store the unique identification of each 
of the two or more callers calling the shared functional unit by operating analogous to a 
shift register. 

9. The method of claim 7, 

wherein said comparing the callers comprises comparing the unique 
identifications of the callers in the first portion of the caller history to the unique 
identifications of the callers in the second portion of the caller history. 

10. The method of claim 4, further comprising: 

wherein said comparing callers in the first part of the caller history to the second 
part of the caller history operates to select a periodic portion of the caller history. 

11. The method of claim 10, further comprising 

using a multiplexer to predict the next caller of the shared functional unit after 
selecting the periodic portion of the caller history. 

12. The method of claim 1, 

wherein the parallel execution unit comprises one or more of: 
an FPGA; 

a programmable hardware element; 
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a reconfigurable logic unit; 

a nonconfigurable hardware element; 

an ASIC; 

a computer comprising a plurality of processors; and 
any other computing device capable of executing multiple threads in 
parallel. 

13. The method of claim 1, 

wherein the state information comprises one or more of: 
execution state; 
values of any variable; 
previous inputs; 
previous outputs; and 

any other information related to execution of a node in a dataflow 

diagram. 

14. The method of claim 1, 

wherein the run-time prediction operates to optimize execution of the nodes in the 
dataflow program. 

15. The method of claim 1, 

wherein the shared functional unit and the plurality of callers are generated from a 
dataflow program. 



22 



16. A method for run-time call prediction for resolving resource contention 
between two or more callers of a shared node in a dataflow program, the method 
comprising: 

detecting a calling pattern by a plurality of callers of the shared functional unit; 
predicting a next caller out of the plurality of callers of the shared functional unit; 

and 

loading state information associated with the next caller out of the plurality of 

callers; 

wherein the shared functional unit and the plurality of callers are operable to 
execute in parallel on a parallel execution unit. 

17. The method of claim 16, 

wherein the dataflow program comprises of a plurality of nodes, wherein one or 
more of the plurality of nodes are operable to be called by two or more nodes of the 
plurality of nodes. 

18. The method of claim 16, 

wherein the run-time call prediction operates to optimize execution of the nodes 
in the dataflow program. 

19. The method of claim 16, 

wherein the dataflow program executes on a parallel execution unit, wherein the 
parallel execution unit comprises one or more of: 
an FPGA; 

a programmable hardware elements; 
a reconfigurable logic unit; 
a nonconfigurable hardware element; 
an ASIC; 

a computer comprising a plurality of processors; and 
any other computing device capable of executing multiple threads in 
parallel. 
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20. The method of claim 16, further comprising: 
storing a caller history of the shared node; 
wherein said detecting comprises: 

dividing the caller history into a first portion of the caller history and a 
second portion of the caller history, wherein the first portion and the second portion hold 
substantially the same portion of the caller history; and 

comparing the callers of the first portion of the caller history to the callers 
of the second portion of the caller history. 

21. The method of claim 20, 

wherein each of the plurality of callers has a unique identification, wherein the 
unique identification is operable to be used in the caller history. 

22. The method of claim 21, 

wherein the history register is operable to store the unique identification of each 
of the two or more callers calling the shared functional unit by operating analogous to a 
shift register. 

23 . The method of claim 2 1 , 

wherein said comparing the callers comprises comparing the unique 
identifications of the callers in the first portion of the caller history to the unique 
identifications of the callers in the second portion of the caller history. 

24. The method of claim 20, 

wherein said comparing callers in the first part of the caller history to the second 
part of the caller history operates to select a periodic portion of the caller history. 

25. The method of claim 16, further comprising: 

wherein the shared functional unit and the plurality of callers are generated from 
the dataflow program. 
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26. A memory medium comprising instructions to generate a program to 
perform run-time call prediction of a next caller of a shared functional unit, wherein the 
program is intended for deployment on a parallel execution unit, wherein the program is 
executable to perform: 

detect a calling pattern of a plurality of callers of the shared functional unit, 
wherein the shared functional unit is operable to be called by two or more callers out of 
the plurality of callers; 

predict the next caller out of the plurality of callers of the shared functional unit; 

and 

load state information associated with the next caller out of the plurality of 

callers; 

wherein the shared functional unit and the plurality of callers are operable to 
execute in parallel on the parallel execution unit. 

27. The memory medium of claim 26, 

wherein the run-time call prediction operates to optimize execution of the nodes 
in the dataflow program. 

28. The memory medium of claim 26, 

wherein the dataflow program executes on a parallel execution unit, wherein the 
parallel execution unit comprises one or more of: 
an FPGA; 

a programmable hardware elements; 
a reconfigurable logic unit; 
a nonconfigurable hardware element; 
an ASIC; 

a computer comprising a plurality of processors; and 
any other computing device capable of executing multiple threads in 
parallel. 
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29. The memory medium of claim 26, wherein the program is further 
executable to perform: 

store a caller history of the shared node; 
wherein said detecting comprises: 

dividing the caller history into a first portion of the caller history and a 
second portion of the caller history, wherein the first portion and the second portion hold 
substantially the same portion of the caller history; and 

comparing the callers of the first portion of the caller history to the callers 
of the second portion of the caller history. 

30. The memory medium of claim 29, 

wherein each of the plurality of callers has a unique identification, wherein the 
unique identification is operable to be used in the caller history. 

3 1 . The memory medium of claim 30, 

wherein the history register is operable to store the unique identification of each 
of the two or more callers calling the shared functional unit by operating analogous to a 
shift register. 

32. The memory medium of claim 30, 

wherein said comparing the callers comprises comparing the unique 
identifications of the callers in the first portion of the caller history to the unique 
identifications of the callers in the second portion of the caller history. 

33. The memory medium of claim 29, further comprising: 

wherein said comparing callers in the first part of the caller history to the second 
part of the caller history operates to select a periodic portion of the caller history. 

34. The memory medium of claim 26, 
wherein the program comprises one or more of: 
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program instructions; 
digital logic; and 

any type of hardware description used to configure the parallel execution 

unit. 

35. The memory medium of claim 26, 

wherein the shared functional unit and the plurality of callers are generated from 
the dataflow program. 

36. The memory medium of claim 26, 

wherein the program comprises a control and arbitration logic unit that is operable 
to said detect, said predict, and said load. 

37. A system for run-time optimization of a dataflow program, the system 
comprising: 

a parallel execution unit; 
a plurality of callers; 

a shared functional unit, wherein the shared functional unit is operable to be 
called by two or more callers out of the plurality of callers, wherein the shared functional 
unit and the plurality of callers are operable to execute in parallel on the parallel 
execution unit; 

an optimization algorithm, wherein the optimization algorithm is operable to: 

detect a calling pattern of the plurality of callers of the shared functional 

unit; 

predict the next caller out of the plurality of callers of the shared 
functional unit; and 

allocate state information associated with the next caller out of the 
plurality of callers. 

38. The system of claim 37, 
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wherein the parallel execution unit comprises one or more of: 
an FPGA; 

a programmable hardware elements; 
a reconfigurable logic unit; 
a nonconfigurable hardware element; 
an ASIC; 

a computer comprising a plurality of processors; and 
any other computing device capable of executing multiple threads in 
parallel. 

39. The system of claim 37, wherein the optimization algorithm is further 
operable to: 

store a caller history of the shared node; 
wherein said detecting comprises: 

dividing the caller history into a first portion of the caller history and a 
second portion of the caller history, wherein the first portion and the second portion hold 
substantially the same portion of the call history; and 

comparing the callers of the first portion of the caller history to the callers 
of the second portion of the caller history. 

40. The system of claim 39, 

wherein each of the plurality of callers has a unique identification, wherein the 
unique identification is operable to be used in the caller history. 

41 . The system of claim 39, 

wherein the history register is operable to store the unique identification of each 
of the two or more callers calling the shared functional unit by operating analogous to a 
shift register. 

42. The system of claim 39, 
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wherein said comparing the callers comprises comparing the unique 
identifications of the callers in the first portion of the caller history to the unique 
identifications of the callers in the second portion of the caller history. 

43. The system of claim 39, further comprising: 

wherein said comparing callers in the first part of the caller history to the second 
part of the caller history operates to select a periodic portion of the caller history. 

44. The system of claim 37, 

wherein the shared functional unit and the plurality of callers are generated from 
the dataflow program. 

45. The system of claim 37, 

wherein the optimization algorithm is comprised on a control and arbitration logic 
unit that is operable to said detect, said predict, and said load. 
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